32 research outputs found
Image Retrieval using Textual Cues
International audienceWe present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce
CVIT-MT Systems for WAT-2018
This document describes the machine translation system used in the
submissions of IIIT-Hyderabad CVIT-MT for the WAT-2018 English-Hindi
translation task. Performance is evaluated on the associated corpus provided by
the organizers. We experimented with convolutional sequence to sequence
architectures. We also train with additional data obtained through
backtranslation
Scene Text Recognition using Higher Order Language Priors
International audienceThe problem of recognizing text in images taken in the wild has gained significant attention from the computer vision community in recent years. Contrary to recognition of printed documents, recognizing scene text is a challenging problem. We focus on the problem of recognizing text extracted from natural scene images and the web. Significant attempts have been made to address this problem in the recent past. However, many of these works benefit from the availability of strong context, which naturally limits their applicability. In this work we present a framework that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary. We show experimental results on publicly available datasets. Furthermore, we introduce a large challenging word dataset with five thousand words to evaluate various steps of our method exhaustively. The main contributions of this work are: (1) We present a framework, which incorporates higher order statistical language models to recognize words in an unconstrained manner (i.e. we overcome the need for restricted word lists, and instead use an English dictionary to compute the priors). (2) We achieve significant improvement (more than 20%) in word recognition accuracies without using a restricted word list. (3) We introduce a large word recognition dataset (atleast 5 times larger than other public datasets) with character level annotation and benchmark it
An MRF Model for Binarization of Natural Scene Text
International audienceInspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy
Scene Text Recognition and Retrieval for Large Lexicons
International audienceIn this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-specific list of words, known as the small lexicon setting, is unavailable. We present a conditional random field model defined on potential character locations and the interactions between them. Observing that the interaction potentials computed in the large lexicon setting are less effective than in the case of a small lexicon, we propose an iterative method, which alternates between finding the most likely solution and refining the interaction po-tentials. We evaluate our method on public datasets and show that it improves over baseline and state-of-the-art approaches. For example, we obtain nearly 15% improvement in recognition accuracy and precision for our retrieval task over baseline methods on the IIIT-5K word dataset, with a large lexicon containing 0.5 million words
All-sky search for long-duration gravitational wave transients with initial LIGO
We present the results of a search for long-duration gravitational wave transients in two sets of data collected by the LIGO Hanford and LIGO Livingston detectors between November 5, 2005 and September 30, 2007, and July 7, 2009 and October 20, 2010, with a total observational time of 283.0 days and 132.9 days, respectively. The search targets gravitational wave transients of duration 10-500 s in a frequency band of 40-1000 Hz, with minimal assumptions about the signal waveform, polarization, source direction, or time of occurrence. All candidate triggers were consistent with the expected background; as a result we set 90% confidence upper limits on the rate of long-duration gravitational wave transients for different types of gravitational wave signals. For signals from black hole accretion disk instabilities, we set upper limits on the source rate density between 3.4Ă10-5 and 9.4Ă10-4 Mpc-3 yr-1 at 90% confidence. These are the first results from an all-sky search for unmodeled long-duration transient gravitational waves. © 2016 American Physical Society
All-sky search for long-duration gravitational wave transients with initial LIGO
We present the results of a search for long-duration gravitational wave transients in two sets of data collected by the LIGO Hanford and LIGO Livingston detectors between November 5, 2005 and September 30, 2007, and July 7, 2009 and October 20, 2010, with a total observational time of 283.0 days and 132.9 days, respectively. The search targets gravitational wave transients of duration 10-500 s in a frequency band of 40-1000 Hz, with minimal assumptions about the signal waveform, polarization, source direction, or time of occurrence. All candidate triggers were consistent with the expected background; as a result we set 90% confidence upper limits on the rate of long-duration gravitational wave transients for different types of gravitational wave signals. For signals from black hole accretion disk instabilities, we set upper limits on the source rate density between 3.4Ă10-5 and 9.4Ă10-4 Mpc-3 yr-1 at 90% confidence. These are the first results from an all-sky search for unmodeled long-duration transient gravitational waves. © 2016 American Physical Society
Search for Tensor, Vector, and Scalar Polarizations in the Stochastic Gravitational-Wave Background
The detection of gravitational waves with Advanced LIGO and Advanced Virgo has enabled novel tests of general relativity, including direct study of the polarization of gravitational waves. While general relativity allows for only two tensor gravitational-wave polarizations, general metric theories can additionally predict two vector and two scalar polarizations. The polarization of gravitational waves is encoded in the spectral shape of the stochastic gravitational-wave background, formed by the superposition of cosmological and individually unresolved astrophysical sources. Using data recorded by Advanced LIGO during its first observing run, we search for a stochastic background of generically polarized gravitational waves. We find no evidence for a background of any polarization, and place the first direct bounds on the contributions of vector and scalar polarizations to the stochastic background. Under log-uniform priors for the energy in each polarization, we limit the energy densities of tensor, vector, and scalar modes at 95% credibility to Ω0T<5.58Ă10-8, Ω0V<6.35Ă10-8, and Ω0S<1.08Ă10-7 at a reference frequency f0=25 Hz. © 2018 American Physical Society
Learning support order for manipulation in clutter
IEEE Robotics and Automation Society (RAS);IEEE Industrial Electronics Society (IES);The Robotics Society of Japan (RSJ);The Society of Instrument and Control Engineers (SICE);New Technology Foundation (NTF)2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013 -- 3 November 2013 through 8 November 2013 -- Tokyo -- 102443Understanding positional semantics of the environment plays an important role in manipulating an object in clutter. The interaction with surrounding objects in the environment must be considered in order to perform the task without causing the objects fall or get damaged. In this paper, we learn the semantics in terms of support relationship among different objects in a cluttered environment by utilizing various photometric and geometric properties of the scene. To manipulate an object of interest, we use the inferred support relationship to derive a sequence in which its surrounding objects should be removed while causing minimal damage to the environment. We believe, this work can push the boundary of robotic applications in grasping, object manipulation and picking-from-bin, towards objects of generic shape and size and scenarios with physical contact and overlap. We have created an RGBD dataset that consists of various objects used in day-to-day life present in clutter. We explore many different settings involving different kind of object-object interaction. We successfully learn support relationships and predict support order in these settings. © 2013 IEEE